1. Pick three observations and make predictions

In [6]:
observations = X.iloc[[0, 2, 16, 1337], :]
observations
Out[6]:
Potential Age Height(in cm) Weight(in kg) Weak Foot Rating Pace Total Shooting Total Passing Total Dribbling Total Defending Total ... Long Shots Aggression Interceptions Positioning Vision Penalties Composure Marking Standing Tackle Sliding Tackle
10157 75 22 180 79 2 71 33 48 57 64 ... 34 65 59 45 39 35 50 65 66 62
4894 73 26 183 73 2 67 62 70 72 69 ... 67 71 67 68 69 64 69 67 71 72
9210 67 27 185 68 3 68 51 59 63 63 ... 55 68 61 57 54 44 58 61 65 66
2968 75 24 176 72 2 82 63 67 73 66 ... 62 66 68 67 68 46 70 65 66 68

4 rows × 40 columns

In [ ]:
model.predict(observations)
Out[ ]:
array([1244., 9120., 3855., 4165.])

2. Calculate LIME explanations

In [ ]:
exp = explainer.explain_instance(observations.iloc[0, :], model.predict, num_features=5)
exp.show_in_notebook(show_table=True)
/home/quczer/anaconda3/envs/xai/lib/python3.9/site-packages/sklearn/base.py:450: UserWarning: X does not have valid feature names, but RandomForestRegressor was fitted with feature names
  warnings.warn(
Intercept 8436.219906243236
Prediction_local [10066.90813527]
Right: 1244.0
In [ ]:
exp = explainer.explain_instance(observations.iloc[1, :], model.predict, num_features=5)
exp.show_in_notebook(show_table=True)
/home/quczer/anaconda3/envs/xai/lib/python3.9/site-packages/sklearn/base.py:450: UserWarning: X does not have valid feature names, but RandomForestRegressor was fitted with feature names
  warnings.warn(
Intercept 8372.939218917289
Prediction_local [15042.36951172]
Right: 9120.0
In [ ]:
exp = explainer.explain_instance(observations.iloc[2, :], model.predict, num_features=5)
exp.show_in_notebook(show_table=True)
/home/quczer/anaconda3/envs/xai/lib/python3.9/site-packages/sklearn/base.py:450: UserWarning: X does not have valid feature names, but RandomForestRegressor was fitted with feature names
  warnings.warn(
Intercept 8441.603788060947
Prediction_local [7698.94590204]
Right: 3855.0
In [ ]:
exp = explainer.explain_instance(observations.iloc[3, :], model.predict, num_features=5)
exp.show_in_notebook(show_table=True)
/home/quczer/anaconda3/envs/xai/lib/python3.9/site-packages/sklearn/base.py:450: UserWarning: X does not have valid feature names, but RandomForestRegressor was fitted with feature names
  warnings.warn(
Intercept 8405.173229618475
Prediction_local [16154.5064011]
Right: 4165.0

3. Explanations are stable for any given observation - the results are the same after recalculation of lime explanations. But between observations feature importances differ but not by much so we could say that local interpretations give global insight.

4. SHAP

In [ ]:
display(observations.iloc[[0], :])
shap.plots.waterfall(shap_values[0], max_display=7)
Potential Age Height(in cm) Weight(in kg) Weak Foot Rating Pace Total Shooting Total Passing Total Dribbling Total Defending Total ... Long Shots Aggression Interceptions Positioning Vision Penalties Composure Marking Standing Tackle Sliding Tackle
10157 75 22 180 79 2 71 33 48 57 64 ... 34 65 59 45 39 35 50 65 66 62

1 rows × 40 columns

In [ ]:
display(observations.iloc[[1], :])
shap.plots.waterfall(shap_values[1], max_display=7)
Potential Age Height(in cm) Weight(in kg) Weak Foot Rating Pace Total Shooting Total Passing Total Dribbling Total Defending Total ... Long Shots Aggression Interceptions Positioning Vision Penalties Composure Marking Standing Tackle Sliding Tackle
4894 73 26 183 73 2 67 62 70 72 69 ... 67 71 67 68 69 64 69 67 71 72

1 rows × 40 columns

In [ ]:
display(observations.iloc[[2], :])
shap.plots.waterfall(shap_values[2], max_display=7)
Potential Age Height(in cm) Weight(in kg) Weak Foot Rating Pace Total Shooting Total Passing Total Dribbling Total Defending Total ... Long Shots Aggression Interceptions Positioning Vision Penalties Composure Marking Standing Tackle Sliding Tackle
9210 67 27 185 68 3 68 51 59 63 63 ... 55 68 61 57 54 44 58 61 65 66

1 rows × 40 columns

In [ ]:
display(observations.iloc[[3], :])
shap.plots.waterfall(shap_values[3], max_display=7)
Potential Age Height(in cm) Weight(in kg) Weak Foot Rating Pace Total Shooting Total Passing Total Dribbling Total Defending Total ... Long Shots Aggression Interceptions Positioning Vision Penalties Composure Marking Standing Tackle Sliding Tackle
2968 75 24 176 72 2 82 63 67 73 66 ... 62 66 68 67 68 46 70 65 66 68

1 rows × 40 columns

The main differences seem to be that shap values give insight how particular feature changes the average value for a given observation. Lime on the other hand because of fitting a linear model under the hood only can indicate if a feature correlates positively and what is the magnitude of the change. Looking at the last example - for low value (70) of reactions SHAP lowers the estimation, but LIME just says that higher value would result in higher prediction (the average is hidden in the intercept).

7. LinearRegressor

In [ ]:
exp = explainer.explain_instance(observations.iloc[0, :], linear_model.predict, num_features=5)
exp.show_in_notebook(show_table=True)
X does not have valid feature names, but LinearRegression was fitted with feature names
Intercept 8828.20162645332
Prediction_local [10392.27248154]
Right: 10457.794068652118
In [ ]:
exp = explainer.explain_instance(observations.iloc[1, :], linear_model.predict, num_features=5)
exp.show_in_notebook(show_table=True)
X does not have valid feature names, but LinearRegression was fitted with feature names
Intercept 8966.151865049778
Prediction_local [17467.89932896]
Right: 19474.53320250925
In [ ]:
exp = explainer.explain_instance(observations.iloc[2, :], linear_model.predict, num_features=5)
exp.show_in_notebook(show_table=True)
X does not have valid feature names, but LinearRegression was fitted with feature names
Intercept 8988.546949948988
Prediction_local [5004.21289191]
Right: 6469.040567241638
In [ ]:
exp = explainer.explain_instance(observations.iloc[3, :], linear_model.predict, num_features=5)
exp.show_in_notebook(show_table=True)
X does not have valid feature names, but LinearRegression was fitted with feature names
Intercept 8880.021091326418
Prediction_local [19638.19597736]
Right: 19019.14318966618
  1. Linear Regression exhibits substantially different feature importances than Random Forest. The most important feature remained the potential but the rest is completely different. Since we try to approximate a linear model with linear model it is no surprise that we get very similar importances across different observations.